The nature and validity of implicit bias training for health care providers and trainees: A systematic review

The number of health care educational institutions/organizations adopting implicit bias training is growing. Our systematic review of 77 studies (published 1 January 2003 through 21 September 2022) investigated how implicit bias training in health care is designed/delivered and whether gaps in knowledge translation compromised the reliability and validity of the training. The primary training target was race/ethnicity (49.3%); trainings commonly lack specificity on addressing implicit prejudice or stereotyping (67.5%). They involved a combination of hands-on and didactic approaches, lasting an average of 343.15 min, often delivered in a single day (53.2%). Trainings also exhibit translational gaps, diverging from current literature (10 to 67.5%), and lack internal (99.9%), face (93.5%), and external (100%) validity. Implicit bias trainings in health care are characterized by bias in methodological quality and translational gaps, potentially compromising their impacts.


INTRODUCTION
Health care providers' implicit racial bias (spontaneously activated attitudes and beliefs) is one major factor contributing to racial disparities in the patient care quality (1)(2)(3).Emerging data also suggest other forms of implicit bias (e.g., weight, disability, and sexuality) undermine patients' health care experiences.Many U.S. educational institutions and organizations responsible for training health care providers have made increasing efforts to address implicit bias.For example, in 2020, the American Medical Association renewed their pledge to combat implicit bias in medicine and announced that they would lobby medical schools to implement implicit bias training (4).Legislators have also pushed to mandate implicit bias training for health care providers.Currently, eight states mandate implicit bias training as part of continuing medical education requirements (5).Furthermore, the Black Maternal Momnibus Act of 2023 calls for the provision of "funding for grant programs to implement and study consistent bias, racism, and discrimination trainings for all employees in maternity care settings" (6,7).Efforts to address implicit bias in health care are not limited to the United States.In response to the recent student activism, several medical schools in the United Kingdom made commitments to implement implicit bias training (8).
Despite the increasing number of health care educational institutions and organizations adopting implicit bias training, there are no guidelines for the development and implementation of such training.Further, the training content often does not reflect current scientific knowledge about implicit bias and its role in the patient care quality.For example, many implicit bias trainings tend to focus only on the affective component of implicit bias-prejudice.While implicit prejudice is linked consistently to the patient-provider communication quality (9)(10)(11)(12), multiple recent reviews (2,13,14) found little evidence to support a link between provider implicit prejudice and provider treatment recommendations or final decisions.Lastly, there is no evidence these trainings result in long-term behavioral change.
In their call to action, Hagiwara and colleagues (15) posited that the development and implementation of effective implicit bias training must be (i) grounded in the knowledge gained through social psychology research on implicit bias and (ii) executed in incremental stages within the Clinical and Translational Science (CTS) framework.The five stages in the CTS range from T0 (basic science research) to T4 (Translation to community) (16).They argued any translational gap between the stages drastically attenuates the effectiveness of implicit bias training.Although Hagiwara and colleagues posited that T1 (Translation to humans) to T3 (Translation to practices) are particularly relevant to implicit bias training (15), we argue that T4 is a critical stage for ensuring generalizability of training and maximizing its impact.The goal of this systematic review was twofold.First, we investigated how implicit bias training for health care trainees and providers is designed and delivered.Second, we assessed whether the reliability and validity of the implicit bias training were compromised by translational gaps.Recognizing where and what translational gaps exist is essential for improving implicit bias training and ultimately achieving health care equity.

Study selection
The initial searches generated 14,183 results, which were reduced to 9424 abstracts after removing duplicates and studies published before 2003.During the abstract screening, 8950 records were excluded because the abstracts did not mention implicit bias training, and 76 records were excluded because they were conference abstracts.We screened 398 full-text articles and excluded 336 studies that did not meet all inclusion criteria, resulting in 62 studies.The 1-year follow-up searches generated 1663 results, which were reduced to 1190 without duplicates.One thousand one hundred twenty-six and 28 manuscripts were removed because their abstracts did not mention implicit bias or they were conference abstracts, respectively.Twenty-one studies not meeting all inclusion criteria were removed during full-text screening, resulting in 15 studies.The current review included 77 studies (Fig. 1 and table S2).

Study characteristics
The first study was published in 2008, and the number of published studies has rapidly increased over years (Fig. 2).

Study design
The most common research design used was quantitative (n = 38, 49.4%), followed by mixed methods (n = 28, 36.4%) and qualitative designs (n = 11, 14.3%).In 35 studies (45.5%), implicit bias training was delivered as part of the regular curriculum; in the remaining 42 studies (54.5%), training was delivered in addition to the regular curriculum.While implicit bias was the main training goal in 41 studies (53.2%), it was part of larger goals in 35 studies (45.5%; n = 1 unclear).Some common larger goals included cultural sensitivity and/ or competency, DEI (diversity, equity, and inclusion), antiracism, and health disparities.

Participants
Sample sizes ranged from 12 to 1250, representing all professional statuses (i.e., students, residents, fellows, and faculty/practicing clinicians) in 16 clinical specialties.

Risk of bias in studies
The mean score of the Medical Education Research Quality Instrument (MERSQI) (sum of 10 items) of 66 quantitative (table S3) and mixed methods studies (table S4) was 8.95 (SD = 2.85; range, 5.5 to 16).Only 14 studies (21.2%) were considered low risk of bias.In addition, only four studies (6.0%) demonstrated full validation of the assessment measures.
The mean score of the Critical Appraisal Skills Programme (CASP) (percentage of "yes" across 10 items) of 39 qualitative (table S5) and mixed-methods studies was 45.90% (SD = 28.54%;range, 0 to 90%).Only 11 studies (28.2%) were considered low risk of bias.Notably, less than half of the studies (48.7%) clearly defined their qualitative study aims.Consequently, we were unable to determine the appropriateness of the qualitative methodological approach, research design, and the theoretical underpinnings for those studies.

Results of individual studies and syntheses
Data S1 presents characteristics of individual implicit bias trainings reported in the 77 studies.

Training characteristics
Table 1 provides the summary of training characteristics stratified by low versus moderate to high bias risk.Overall, the bias component, the targets of bias, and the training structure (format, duration, and frequency) were similar between the two groups.However, there were several notable differences.

Bias component
The proportion of studies that did not specify the component of implicit bias (i.e., prejudice, stereotyping, or both) that their training aimed to address was higher among moderate to high (72.7%)than low bias risk studies (54.5%).In addition, only two trainings reported in the low bias risk studies addressed implicit stereotyping only (9.1%).

Bias target
The proportion of studies that did not specify the target of bias was higher in the moderate to high (27.3%)than in the low bias risk group (13.6%).The four most commonly addressed targets were the same between the two groups (i.e., race/ethnicity, weight, gender, and sexual orientation/gender identity).Some trainings reported in moderate to high bias risk studies also focused on socioeconomic status (n = 5, 9.1%) and gave participants an option to select one from multiple options (n = 1, 1.8%).It should be noted that the numbers for the bias target do not add up to 100% because some training programs targeted multiple groups (13.6 and 16.4% of low versus moderate to high bias risk studies, respectively).

Training structure
All three aspects of training structure (format, duration, and frequency) were similar between low and moderate to high bias risk studies: Trainings typically involved a combination of hands-on activities and didactic presentations (75.3%), which on average lasted 343.15 min (SD = 519.45)and were delivered in a single day (53.2%).However, there were three notable differences between the two groups.First, trainings reported in low bias risk studies tended to last longer (M = 409.31)than moderate to high bias risk studies (M = 308.51).Second, there were more trainings reported in moderate to high bias risk studies that used a combination of handson activities and didactic presentations (80.0%) than low bias risk studies (63.6%).Last, all three trainings that used didactic presentations alone were low bias risk studies.

Translational gaps
Table 2 provides the summary of translational gaps across stages T1 through T4 stratified by the levels of study bias risk.

T1-T2 translational gap
A gap between T1 and T2 reflects inconsistencies between the scientific evidence of implicit bias and how implicit bias training is designed.As noted in the introduction, implicit prejudice predicts the quality of patient-provider communication but not the quality of treatment recommendations (2,13,14).Our review revealed that 67.5% of studies overall failed to specify which component of bias their trainings are designed to address despite that more low (45.5%)than moderate to high bias risk studies (27.3%) specified the bias component.Further, there were more low bias risk studies (31.8%) than moderate to high bias risk studies (14.5%) that reported the target of bias when their trainings were designed to address stereotyping.
Two important elements of successful behavioral change are (i) learning specific strategies to replace old behaviors and (ii) the opportunity to practice the strategies repeatedly over time (17).As proxy measures, we looked at whether implicit bias training used specific hands-on activities and whether the trainings were delivered across multiple days (assuming that attendees are more likely to be given opportunities to rehearse what they learned in earlier sessions).While almost all training programs involved some sort of hands-on activities (81.8 and 98.2% low versus moderate to high bias risk studies, respectively), less than half of the programs were delivered across multiple days (40.9 and 41.8% low versus moderate to high bias risk studies, respectively).

T2-T3 translational gaps
Gaps between T2 and T3 are reflected in how researchers assessed (i.e., face validity) and tested the effectiveness (i.e., internal validity) of implicit bias training.First, the majority of the low bias risk studies examined changes in implicit attitudes and/or beliefs (50.0%), whereas the majority of the moderate to high bias risk studies examined changes in explicit attitudes and/or beliefs (69.1%).More critically, the intended goals of implicit bias training are improved provider behaviors (i.e., communication behaviors, clinical recommendations, or both) and/or improved patient outcomes (e.g., increased reports in care satisfaction, improved clinical outcomes, increased subsequent health care utilization).However, only five studies (6.5%) examined behavioral changes among providers, and no study examined patient outcomes, indicating low face validity overall.
Second, the only study design that enables researchers to conclude, with confidence, that their trainings mitigate implicit bias is a randomized clinical trial.Only seven studies (9.1%), which were all low bias risk studies, used a randomized clinical trial to test the internal validity of their trainings.Only three of the seven studies found evidence for a significant change in providers'/trainees' attitudes or behaviors.

T3-T4 translational gap
A gap between T3 and T4 reflects the lack of rigorous testing of external validity.No studies included in the current review tested the external validity of their implicit bias trainings, although some studies (22.7% of low bias risk studies and 29.1% of moderate to high bias risk studies) used a sample that consisted of individuals from different professional statuses (e.g., students, residents, and faculty) and/or health care fields (e.g., medicine, nursing, and dentistry).

DISCUSSION
There are several systematic reviews that examined related topics, such as the prevalence of implicit bias in health care professionals (3,18), consequences of provider implicit bias (2,13,19), and the effectiveness of interventions in reducing implicit bias in general (20) and in health care specifically (21).However, this systematic review differs from prior reviews in that it addresses the question of why implicit bias training may or may not be effective.Specifically, we formally assessed the reliability and validity of 77 studies over a 20-year period to understand whether and to what extent scholars have developed evidence-based implicit bias training with the broad potential to improve patient outcomes.Participants included students, residents, fellows, and faculty/practicing clinicians across 16 clinical specialties.The most common targets of implicit bias addressed in these trainings were race/ethnicity, weight, gender, and sexual orientation.
Our findings indicate that the number of implicit bias trainings has rapidly increased over time and that organizations are beginning to integrate this programming into standard clinical curricula.However, we found that implicit bias training in health care settings is characterized by bias in methodological quality (as assessed with MERSQI and CASP) and several translational gaps, which likely compromise their potential impacts.
First, few of these studies are rigorously grounded in theoretical evidence (T1-T2 gaps).The most common example of this is the failure to identify the component of implicit bias being addressed in the training.Failing to identify the relevant component likely results in a misalignment between training and desired outcomes (improved patient-provider communication and/or treatment recommendations).Further, even among studies that specified components of bias, the majority failed to tailor the material appropriately.For example, research has established that differential endorsement of explicit stereotypes (e.g., Black people being more prone to opioid addiction) contributes to disparities in treatment recommendations for Black patients versus white patients (22,23).It is likely the case (although it still needs to be tested empirically) that differential endorsement of implicit stereotypes also contributes to racial disparities in treatment recommendations.Thus, it is important for researchers to specify which group-based stereotypes and what resulting treatment recommendations the training is designed to address.
We found mixed evidence of best teaching practices in training delivery.On one hand, almost all implicit bias trainings used a combination of hands-on activities and didactic presentations.This is consistent with findings from prior research suggesting that mitigating implicit bias requires both improvements in health care providers' understanding of implicit bias and awareness of their own bias (likely achieved with didactic presentations) and learning and practicing concrete strategies (likely achieved with hands-on activities).While the training format does not always dictate learning outcomes, more interactive learning activities typically lead to higherlevel learning outcomes and more consistent transfer of training to the work environment.On the other hand, however, most training is delivered at a single time point, lasting less than 6 hours on Strong evidence (multiple samples coming from different institutions/events) 0 0 *when studies used multiple assessment tools, we selected the highest validity tool as the evidence for face validity.
average.While this likely reflects the constraints imposed by the current health care system and medical education (e.g., competing priorities for time, financial costs of taking providers out of clinical care for clinics and hospitals, and costs of taking providers out of clinical care for patients), these findings suggest that attendees were unlikely to have enough opportunities to practice newly learned strategies to mitigate their implicit bias.This further provides evidence that implicit bias training aimed solely at reducing providers' implicit bias is unlikely to be effective or even realistic in mitigating the negative health care consequences of implicit bias within the current health care system (T3-T1 gaps).
Our review also revealed that only a small number of studies used best research practices that allowed the authors to rigorously assess the efficacy of the implicit bias training (T2-T3 gaps).While the goal of implicit bias training is ultimately to improve provider behaviors and/or patient outcomes, most trainings focused on changing attitudes or beliefs alone.Relatedly, variance in the methodological quality of studies is an important and likely unappreciated barrier to progress in this area.Few studies used validated measures and rigorous empirical approaches such as randomized designs to test efficacy (see tables S3 to S5).We strongly urge that investigators use relevant tools to assess the methodological quality of trainings before testing them at the organizational level.Lastly, given that none of the studies in our review tested the external validity of their programs (T3-T4 gaps), we have no evidence of training effectiveness more broadly.
The current findings must be interpreted with the following limitations with the evidence.As reflected in the number of items coded as "unclear, " many studies failed to provide necessary information to code each item included in the current review.It is unclear whether the information was not addressed in the actual implicit bias trainings or authors simply failed to report them in their publications.Relatedly, only a small number of studies were determined to have low bias risk.Consequently, the stratified results syntheses were based on a highly unbalanced number of studies between low and moderate to high bias.
There are also limitations with the review processes that might have potentially affected the interpretation of the results.First, some of the questions in the MERSQI rely on reviewers' subjective judgments.For example, a binary rating (yes/no) of "appropriateness of analysis" heavily depends on reviewers' knowledge.In addition, there is a lack of clarity in the literature over which quantified CASP scores indicate high, moderate, and low bias (24).Despite these limitations, our approach to the assessment of study bias risk is consistent with the current paradigm and enables us to explore the strengths and weaknesses of both individual studies and the state of research.
Second, the report and interpretation of the implicit bias trainings were based on details provided in the publication, which were often sparse.Very few studies provided sufficient information to verify the instructional methods or content covered.Lack of materials sharing limits replication and reproducibility of the study findings, as well as interpretation and generalizability in this review.
Does implicit bias training work?Should health care educational institutions and organizations as well as the government continue to spend their efforts on it?Our conclusion based on the findings from the current systematic review is it is premature to answer these questions.There is little scientific evidence to support that implicit bias training improves the quality of patient care, and this could be due to three reasons.First, some trainings may actually improve patient care, but they simply did not use proper outcome assessments (i.e., T2-T3 translational gaps).Second, implicit bias trainings just do not work because of translational gaps between T1 and T2.Third, even well-designed implicit bias training that is validated outside the health care context may not be effective in reducing provider implicit bias because of the gaps between T3 back to T1.It is critical for health care educational institutions and organizations to both expand the objectives of implicit bias training and (re)evaluate their implicit bias trainings by using the CTS framework before they (further) devote their time and resources into implicit bias training.Similarly, T3-T4 translational gaps found in all reviewed studies suggest that rigorous testing of external validity of the implicit bias trainings currently used in eight states is urgently needed before more states start to mandate such training.

Eligibility criteria
We defined implicit bias training for health care trainees/providers as the action of teaching trainees/providers the knowledge and skills to either reduce their negative attitudes toward and/or beliefs about a certain social group or mitigate the health care consequences of their bias(es).This is not to imply that implicit bias training programs with other objectives, such as promoting DEI within the health care system (25,26) or reducing interprofessional conflicts (27,28), are not important.However, the training designs, contents, and the assessments of the reliability and validity of the implicit bias training would vary depending on the training objectives.Therefore, this systematic review focused on implicit bias training aimed primarily at reducing disparities in the quality of health care among patients from diverse social groups.Given this focus, we limited our review to training targeting health care trainees/providers with direct patient contact.We also excluded undergraduates pursuing careers in health care because they generally do not have direct patient contact.
We excluded studies from the review if they did not clearly state addressing implicit bias was one of their goals (simply mentioning implicit bias as one training component was insufficient).We also limited our review to published empirical studies or education interventions written in English.We excluded published theses/dissertations and conference abstracts because they generally do not undergo rigorous review processes.Last, we excluded studies published before 2003 to correspond with the publication of the Institute of Medicine's seminal report "Unequal Treatment" (1).

Information sources and search strategy
We conducted the initial search on 1 September 2021 and the 1-year follow-up search on 21 September 2022: Medline (Ovid), Embase (Ovid), PsycINFO (Proquest), Cumulative Index to Nursing and Allied Health Literature (CINAHL) [Elton B. Stephens Company (EBSCO)], Social Work Abstracts (EBSCO), Scopus (Elsevier), and MedEdPortal.Searches included terms, phrases, and controlled vocabulary related to the concepts of implicit bias and health care professionals, including practitioners, trainees, and levels of training.See table S1 for full search strategies.
Coding discrepancies were discussed and solved by the two reviewers first and with all coauthors where necessary.Next, two reviewers independently read full texts of "include" or "maybe" studies and determined whether the studies met all eligibility criteria.Once again, conflicts were first resolved by the two reviewers and then with the coauthors when necessary.

Data collection process
Two reviewers independently extracted information for data synthesis that included bias component (prejudice, stereotyping, or both), sample size, sample professional characteristics (trainee/ provider status and health care fields), training structure (format, duration, and frequency), and outcomes.Primary outcomes were broadly categorized as follows: changes in (i) explicit bias; (ii) implicit bias; (iii) trainee/provider behaviors; and (iv) patient outcomes.We also included self-reflections (e.g., self-awareness of bias, motivation to reduce bias, and intention to improve patient care) as secondary outcomes.Discrepancies were resolved through discussion, first between the two reviewers and then with a third reviewer where necessary.

Study risk of bias assessment
Two reviewers independently assessed risk of bias of quantitative and qualitative studies by using the MERSQI (30) and the CASP checklist (31), respectively.Discrepancies were resolved through discussion first between the two reviewers and then with a third reviewer where necessary.We computed the total score of 10 items in MERSQI and the percentage of "yes" across 10 items in CASP.Last, we used the cutoff scores of MERSQI ≥12 (32) and CASP ≥66.7% (24) to define "low bias" (versus "moderate to high bias").

Synthesis methods
We anticipated a high level of heterogeneity in the findings; thus, we decided to provide a narrative synthesis of the findings.To account for this variability, we created a comprehensive tabulation of study characteristics, structured around intended goals of training, the training content, bias component, and outcomes.We also summarized internal validity (the extent to which cause-and-effect relationships between training and outcomes are established with confidence), face validity (the degree to which outcomes assessed in a study is consistent with the intended goal of the study), and external validity (the extent to which training produces the same outcomes in different populations and settings).The findings were stratified according to bias risk using subgroup analysis.The procedure was preregistered with PROSPERO (CRD42021270641).

Fig. 1 .
Fig. 1.Identification of studies.Preferred Reporting items for Systematic reviews and Meta-Analyses (PRiSMA) diagram illustrating selection and review process of articles related to implicit bias training in health care.

Fig. 2 .
Fig. 2. Published studies on implicit bias training.A number of empirical studies on implicit bias training published between January 2003 and September 2022 by year.

Table 1 . Summary of training characteristics stratified by the levels of study bias risk (low versus moderate to high). Training characteristics Low bias risk (n = 22) Moderate to high bias risk (n = 55) Component of implicit bias
* Range: 34-3030 min M = 409.31(SD= 753.05)Range:50-1680 min M = 308.51(SD= 379.71)*whenstudies reported ranges for training duration, we used the corresponding middle values to compute means and Sds.